Learning Syntactic Constructions from Raw Corpora

نویسندگان

  • Shimon Edelman
  • Zach Solan
  • David Horn
  • Eytan Ruppin
چکیده

Construction-based approaches to syntax (Croft, 2001; Goldberg, 2003) posit a lexicon populated by units of various sizes, as envisaged by (Langacker, 1987). Constructions may be specified completely, as in the case of simple morphemes or idioms such as take it to the bank, or partially, as in the expression what’s X doing Y?, where X and Y are slots that admit fillers of particular types (Kay and Fillmore, 1999). Constructions offer an intriguing alternative to traditional rule-based syntax by hinting at the extent to which the complexity of language can stem from a rich repertoire of stored, more or less entrenched (Harris, 1998) representations that address both syntactic and semantic issues, and encompass, in addition to general rules, “totally idiosyncratic forms and patterns of all intermediate degrees of generality” (Langacker, 1987, p.46). Because constructions are by their very nature language-specific, the question of acquisition in Construction Grammar is especially poignant. We address this issue by offering an unsupervised algorithm that learns constructions from raw corpora.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Different Frequency Patterns on the Syntactic Production of a 6-year-old EFL Home Learner: A Case Study

This longitudinal study investigated the impact of different Frequency Patterns (FP) on the syntactic production of a six-year-old EFL learner in a home context. Target syntactic constructions were presented using games and plays and were traced for their occurrence patterns in input and output. Following each instruction period, the constructions were measured through immediate and delayed ora...

متن کامل

Learning Subcategorization Frames from Corpora: a Case Study for Modern Greek

Certain Natural Language Processing (NLP) applications such as parsing and semantic processing require complete lexicons that provide subcategorization information for a word of interest, i.e. the necessary information about the set(s) of syntactic constituents the word must combine with, in order for its meaning to be fully expressed. Modern Greek presents high flexibility in the allowable ord...

متن کامل

Context effects in language production : models of syntactic priming in dialogue corpora

This thesis addresses the cognitive basis of syntactic adaptation, which biases speakers to repeat their own syntactic constructions and those of their conversational partners. I address two types of syntactic adaptation: short-term priming and longterm adaptation. I develop two metrics for syntactic adaptation within a speaker and between speakers in dialogue: one for short-term priming effect...

متن کامل

Discontinuous Verb Phrases in Parsing and Machine Translation of English and German

In this paper, we focus on the verb-particle (V-Prt) split construction in English and German and its difficulty for parsing and Machine Translation (MT). For German, we use an existing test suite of V-Prt split constructions, while for English, we build a new and comparable test suite from raw data. These two data sets are then used to perform an analysis of errors in dependency parsing, word-...

متن کامل

Selection Restrictions Acquisition from Corpora

This paper describes an automatic clustering strategy for acquiring selection restrictions. We use a knowledge-poor method merely based on word cooccurrence within basic syntactic constructions; hence, neither semantic tagged corpora nor man-made lexical resources are needed for generalising semantic restrictions. Our strategy relies on two basic linguistic assumptions. First, we assume that tw...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004